Ee382c paper
__TOC__ Owners Paul, Ozen, Zheng, and Neeraj TO-DO * Review the paper guidelines guidelines * Continue to work up Introduction, Problem, Solution, and Critique sections (~6500 words = 10 pages. Currently ~3256/6500 words --Polkeem 15:36, 6 May 2007 (UTC)) Topic Flow Control Buffer Allocation Introduction --Polkeem As technology scales in communication networks, buffer allocation becomes ever more critical. This paper explores various proposals for buffer allocation related to flow control. A brief description of the problem is provided, followed by some proposed solutions. The solutions are then analyzed to identify their strengths and weaknesses. Finally, a critique of the topic is provided. --Neerajp Channels are expensive resources and packet traversing between two points could block the channel due to lack of resources at receiver or intermediate points. This problem is unique to networks that use wormhole routing (i.e. attempt to save space). Wormhole routing is a technique in which complete buffer allocation is not made before forwarding the packet. The buffer allocation and flow control is done at the flit level. The wormhole routing reduces the latency of packet, requires less space. The overhead includes VC identification filed in the flits; which adds to extra bits in the flit contents. However wormhole routing technique leads to unrelated traffic being blocked causing lower throughput. The link by link cumulative effect of such blocking can severely affect the network throughput. The problem is unique to networks that use wormhole routing. Traditional packet switch networks that perform flow control at the packet level never block partially transmitted packets. This can be solved by adding virtual channels. Virtual channels decouple allocation of buffer space from channel bandwidth. This allows multiple stored packets (parts of packets called flits) to compete for bandwidth of single physical channel. If a packet is blocked then another packet could bypass it using the physical channel The Problem Describe the problem: What problem are the papers you have chosen addressing? What are the constraints? How does one measure the quality of a solution? What makes this problem hard? What are the key issues? --Polkeem 05:37, 2 May 2007 (UTC) Fill in something here. I think the quality of the solution should be measured by the less number of buffers it takes to achieve similar throughput and latency as a different solution like virtual channel flow control. For flit-reservation Flow Control, this measure is well defined in the paper. This may not be the case for other papers. --Zhengli 04:25, 3 May 2007 (UTC)I suppose that we'd better describe the problem in formula: Given: Topology, Traffic Pattern, Routing algorithm(for simplicity, we can assume it is deterministic), Number of buffer space (Budget) Determine: for virtual channel flow control, how the buffer space should allocate to each router; how one router allocate its buffer quota to each input port; how each input port allocate its quota to each virtual channel Such that: The throughput of the whole network is Maximize Or, the average packet latency is Minimized --Neerajp Given a fixed buffer size, the problem is to decide the extent of virtualization and control overhead associated with it. Virtual channel can also provide a degree of freedom in allocating resources and thus various schemes to route higher priority traffic and to reduce jitter or latency of packets could be used. --Ozen 05:26, 6 May 2007 (UTC) Most of the studies about virtual channels have either not considered virtual channels altogether or have assumed a fixed buffer size for each virtual channel. Therefore, adding more virtual channel to the router improves the performance. But this is not the case for real systems. In reality, we have a fixed buffer length for physical channels and if we add a virtual channel to this physical channel, the buffer size is shared between these virtual channels. So, there is a trade off between adding more virtual channels and the buffer size associated to these virtual channels. Proposed Solutions Describe the solutions: What contribution does each of your papers make toward the solution of this problem? What are their strong points? What are their weak points? What remains to be done? Flit Reservation Flow Control 2 In flit-reservation flow control, control flits traverse the network in advance of data flits, reserving buffers ahead of time. Scheduling ahead of data arrival allows buffers to be held only when the buffers are in actual use, unlike other flow control methods. The paper claims that with just 6 buffers, the throughput of flit-reservation flow control (77%) approaches that of virtual-channel flow control using 16 buffers (80%), suggesting a significant buffer savings as a result of efficient buffer utilization. Control flits traverse the network ahead of data flits to reserve buffer space before the data flits arrive. When the data flits do arrive, they are buffered according to the precalculated reservation. The advanced scheduling enabled by flit reservation allows buffers to be used immediately after departure of the flit, as opposed to other flow control methods where buffers cannot be freed until downstream status information (i.e. credits) are sent back, resulting in a round-trip latency where the buffer cannot be used for other data. One of the benefits pointed out about flit reservation flow control is that for a fixed amount of buffer space, saturation throughput is increased because of more efficient buffer scheduling. The paper claims that for an 8x8 mesh network with eight flit buffers per input, virtual-channel flow control 1 saturates at 63% of the bisection bandwidth while flit-reservation flow control achieves a performance of 77% of bandwidth, resulting in a 20% improvement with equal buffer storage. A clear strength of flit-reservation flow control is apparent in the comparison made between buffer utilization of wormhole and virtual-channel flow control and flit-reservation flow control. In the first 2 allocation schemes, buffers are used according to the following figure: Buffer Turnaround Time of Wormhole and Virtual-Channel Flow Control The buffer is first allocated and held from the time the data flit departs the current node, to when it is processed by the next node, to the time the flow control signal returns to inform the current node that the buffer can be released. Only after this turnaround can the buffer be reused. So the turnaround time of the buffer is at least the sum of the propagation delay of the data flit in the forward direction and the flow control back. In flit-reservation flow control, this turnaround time is essentially zero. Another benefit of flit-reservation flow control is described as providing similar advantages of statically-scheduled flow control, where the flow of data is 'compiled' before runtime. Flit-reservation flow control provides an added benefit in that it supports the flexibility of a dynamically routed network. Instead of scheduling resources at compile time, the scheduling decisions are made when data packets are injected, allowing the schedule to adapt to different traffic patterns. Both methods use buffers efficiently, with immediate turnaround and routing and arbitration latency hidden, but flit-reservation flow control doesn't sacrifice flexibility. A drawback of flit-reservation flow control stems from its usage of a shared pool of buffers. If the buffer pool is small relative to the length of the packets and a packet is blocked, all it's data flits are stalled and hold buffers. The situation becomes worse with an increasing length in packets. So when the buffer pool size is small relative to the packet length, active packets cannot bypass blocked packets. A thorough description of the architecture and experimental results are well documented in this paper. This type of buffer allocation can be used in other types of datapath architectures. It is similar to scoreboarding, where the control packets serve as scorecards that are passed along the different nodes to allocate resources. The paper did fail to fully describe the amount of changes needed to implement error recovery. If there is a severe penalty in resources to provide data integrity, that could offset the resource savings in going with this buffer allocation scheme. With the way designs are scaling nowadays, data integrity and error recovery is becoming more important. Virtual Channels Planning for Networks-on-chip 3 Basic idea The virtual channel planning is a unique problem for on-chip interconnection networks, where is channel bandwidth is comparatively abundant, but the memory is relatively expensive. Besides the tight buffer budget, on-chip networks are always application oriented, which mean we could estimate the traffic patten ahead of time, which means we could know the hot spot of the network when we design the network. Since lots of nodes in the network is not congested as others, if we uniformly allocate virtual channels to each node, there will be a waste of limited resouces. As a result, the papper propose a way to allocate the virtual channels to different nodes so that some physical links should use more virtual channels, and some less, just like the following graph for a mesh network. It is well known that in the mesh network for a uniform random traffic, the network is crowded in the center, but the traffic in the side area are not so heavy. So planning of the virtual channels should be focus in the center area, as shown by the graph. Here are sevral assumptions of the paper: 1. We could know the traffic pattern ahead of time, or at least estimate the traffic pattern pretty well. 2. Router designs are scalbe, which means the router is easy to transform from 5x5 to like 10x10, without a obvious performance decrease. 3. It is base on virtual channel flow control, which means the HOL(head of line blocking) is the main source of the original network, and the congestion could be solved by adding virtual channels. 4. The routing algorithm is determistic. For adaptive routing, the routing could sense the network state and go round the congested area. 5. Buffer is an relative expensive resource in the network. 6. Once the VC planning is finished, it could not change adaptively, either in the network, or in a node. Actullay, the paper never deals with the buffer allocation in one node. The paper focus on buffer allocation problem in a network level, it allocate virtual channels, which is buffer resouces, to different nodes in the network. The paper assume all the virtual channels are fix buffer size length, say m flit. So it convert the virtual channel planning problems into a problem, assign a buget of virtual channels to nodes in the networks. Implemention Approach Since the paper assume that all the congestion could be sovled by allocating a virtual channel, a fix depth of buffers, the implemention approach is how to allocate theses virtual channels. The paper begins by assuming there is only one virtual channels i.e. the physical channel for every node connection. Then the paper use a greedy alogritm to allocate the virtual channel budget. 1. Find the bottleneck: find out which channel is the most busy channel by estimating average switch bandwidth; 2. Add a virtual channel to this channel, which could increase the throughput of this link; 3. Go to 1, until the virtual channel budget is used out. The paper draw this process in a picture as follows: The most interesting part of the paper lies in its analyzing the network to get the most congested linked, which is the first step of the algorithm. The paper use a probabilistic model. 1. Get the packet injection rate from the traffic pattern. \lambda 2. Calculate the traffic flowing rate of each port in each router. \Phi 3. Find out the probability that traffic flow is blocked in each port, each router B 4. Find out the expected contention probalility for flits at input. 5. Get the bandwidth ultilization of each channel. Result The paper report that it could archieve a comparative latency using less buffer. Look at the picture above, for a 4x4 mesh network with a hotspot traffic. When add only one channel to the network, the throughput increases about 12.1%, and if add 10 extra VCs using the paper's algorithm, it could archieved the throughput uniformly allocate 2 VCs to each channel. Let's calculate the saving on buffers. Assume each VC is m flits deep. 4x4 mesh has 48 channels, 2VCs to each channel is 96m flits. 10 extra VCs are 58m flits in total, which save half the buffers and half the areas for memory. The paper also shows the result of saving buffers for a real application. Critique By customized VC allocation, the paper provide a way to save buffers without decreasing the performance in interconnection network, in another way to improve performance using a limited buffer buget. The VC planing is limited by the unstated assumptions which we have mentioned previously. 1. Network designers should know the traffic pattern ahead of time. This might not be the usual case in general multiprocessor design. Moreover, if we can know the traffic pattern well enough, we might design a better topology rather than wasting time on allocating buffers in a regular topology. So this vc planing is kind limited to the limitation of application. 2. The VC planning require a scalable router architecture, which require the router to be simple. The paper didn't account in the complexity to build different routers for different nodes, and the impact of imbanlance router. How to design a flexible router is really a interesting topic. 3. In the paper, the algorithm is greedy, which may not come to a global optimize solution for virtual channel planning. So the result might not be the most optimized one, and there is still room for improving the algorithm. Dally's 92 paper --Neerajparik 06:19, 4 May 2007 (UTC) Virtual channels design must aim at utilizing the channels to the highest possible extent and also must target at silicon space usage. The silicon space is required for virtual channel control logic and for switching fabric complexity. The proposed solution in the paper discusses over head in terms of fixed segmentation of the buffer. One must consider dynamic allocation of these buffers to different Virtual Channels. This would increase buffer utilization efficiency as well as provide the basic framework for implementing the different schemes. This will however increase the control logic overhead, FF overhead. !!Calculate new value based on certain assumption!! Virtual channels were initially used to avoid deadlock in the torus routing chip. A cyclic network with resource dependencies can be made acyclic by routing through appropriate virtual channels. The output queuing in switches provides only single stage resource decoupling. With output queuing there is still a single buffer associated with the output; a long packet that will not fit in single node could still flow into the input thus blocking it. Tamir showed that output queued switches could be looked as split input queued switches. Other use of virtual channels was guaranteeing bandwidth to circuits as in iWARP. Virtual channels can be used in both input and output queue switches. In virtual channel flow control, buffers are allocated to each virtual channel. These buffers could be allocated independently and on-demand basis. A packet is assigned a VC and owns that VC until it is completely transferred. The packet makes progress using virtual channel flow control performed at the flit level. The transmitting node keeps state information like count of free buffers, VC assignment and priority pf packet. The receiving node also keeps VC assignment, input and output pointers for each lane buffer and state of the channel (waiting , active, idle). Adding virtual channels also increase the complexity of switch design. The problem at switching is now to treat these virtual channels at input and outputs as separate channels or multiplexed physical channels. Treat virtual channels as separate physical channels (also called de multiplexed) increases the complexity of the switch. Treating Virtual Channels as multiplexed channels leads to complexity in managing flow control and arbitration. However having the multiplexed output gives reasonable performance and flow control complexity. Multiplexing at any level would reduce performance i.e. can cause unnecessary blocking (??Cite example of this for output blocking??) but reduced number of outputs than inputs would not affect performance of fabric to that extent as one of the possible inputs can still be competing for output. The arbitration must take flow control information for that Virtual channel into account. Prioritized demand multiplexing --Neerajparik 06:19, 4 May 2007 (UTC) The prioritized division multiplexing (PDM) paper discusses a scheme to prioritize packets through the network. In this scheme multiple cycles are allocated to the flits of higher priority traffic to reduce latency of such traffic through the network. see critique section below; I have written description of solution and critique togather. Another point i have not mentioned in critique below but will like to discuss, when we meet is, the authors of paper could have changed the flit units i.e. increased flit units for the higher priority data. Multimedia router --Neerajparik 06:19, 4 May 2007 (UTC) Even with prioritization in place the virtual channel based network cannot deliver traffic with low-jitter and deterministic data rate. Thus a hybrid network that supports both the circuit switching and the virtual channels can be designed. In this virtual channel based network, one can provide all three kind of service namely CBR, VBR and best effort. NOTE: This is not complete,yet. I need to add more ... may be sunday else wednesday .... The Effect of Virtual Channel Organization on the Performance of Interconnection Networks --Ozen 06:13, 6 May 2007 (UTC) Increasing the buffer depth and therefore decreasing the number of virtual channels, results in better performance for low traffic rates. Because, in low traffic loads, there are sufficient number of virtual channels to handle this traffic and the important thing is the depth of the buffer. For high traffic rates, incrasing the number of virtual channels can improve the performance of the network under uniform traffic. However, if we use matrix transpose and hotspot traffic, deeper buffers for virtual channels causes performance improvement. Besides, assigning a fixed number of virtual channels to each dimension outperforms assigning different number of virtual channels to each dimension. Critique of the Topic Write a critique of the topic: Is the problem the right one? Is there a way to recast the problem to accomplish the same end in an easier way? Do the solutions address the critical issues associated with this topic? Are there any key issues that have not been addressed? How would you address the problem? Would you use one of these solutions or would you take a different approach? --Zhengli 05:22, 3 May 2007 (UTC)For virtual channel Planning for Networks-on-Chip, the idea is very interesting, but seems the evaluation of contention is not in a good way, and it overestimate the complexity to build a cutomized router. --Polkeem 05:41, 3 May 2007 (UTC) For this Critique section, I think it is saying we need to critique the topic we've chosen, and not really critique the papers we've read so much. Even though we are supposed to show how the papers address the problem we've stated, I think we're supposed to use these papers as tools to discuss the problem more and solutions to the problems. Actual critique of the paper seems like it should go into the corresponding 'Solutions' sections. What do you guys think? --Neerajparik 06:18, 4 May 2007 (UTC)The Prioritized demand multiplexing scheme discussed provides more bandwidth per round robin cycle to the flits from higher priority. The higher priority is carried as a field in the head of the packet. The header-field also determines the number of back-to-back cycles assigned per round robin cycle. The scheme implements a PDM counter which keeps count of the number of cycles as the packet is being transferred. This PDM counter is implemented per Virtual channel and leads to logic overhead. Depending on the complexity of counter and pipeline cycles, this could lead to bubbles in the output cycles and slower cycle time of the router. This scheme does help to reduce the latency of the higher priority traffic. The PDM uses a simple buffer allocation mechanism in which each virtual channel is assigned a fixed number of flit buffers. The scheme relies on fast siphoning of higher priority buffers and thus fast reverses flow of credits to prioritize service. A subtle point to note here is that this fast reverse flow of credits for higher priority traffic tends to reduce bubbles in the higher priority data (bubbles are introduced due to varying number of virtual channels along the path of packet). A simpler scheme in which only one higher priority class is specified, the packets of this class will get consecutive back-to-back cycles clipped to some max-number. The max number is required for fairness and preventing higher priority packets from hogging network. This scheme is very simple to implement, this doesn't hit cycle time much and services almost all the application requirements. In this scheme, higher priority packets get back-to-back cycles allocated to its flits either until it is empty or has reached the threshold. Also it has been shown that having higher number of priority classes the difference in average delay after 3-4 classes is minimal and doesn't gain any performance due to inherent jitter associated with the delays. --Ozen 06:20, 6 May 2007 (UTC) The paper is straightforward and express the ideas clearly. The problem is to find the optimal number of virtual channels to assign to the physical channel. It gives detail simulation results for the trade off between the buffer depth and the number of virtual channels for varying traffic rate and traffic pattern. --Zhengli 06:39, 10 May 2007 (UTC) The buffer allocation problem is a special problem for buffer flow control such as worewhole, store and forward, and etc. Becasue people usually adopt virtual channel flow control in buffer flow control. The buffer allocation problem could come to how to allocate buffers into different channels. This is a good problem, especially in some case when the buffer resouces is precious, such as on-chip enviroment. The essence of buffer allocation is to put buffer to where it could be ultilized to improve network throughput. There are two levels of buffer allocation problem: 1. How to allocate buffer buget in a network to different channels. We can assume all routers take input queue architeucte, this problem becomes, how to allocate buffer to different routers in the whole chip. The basic instinct is to allocate to those who need it. 2. How to allocate buffer buget inside a router. For a router, there are several input ports, and for each port there are several vritual channels. The problem is how to allocate these buffers. Besides the two levels, there are two ways of allocation: static and dynamic. For the first level of buffer allocation, there may be only one way, static, because we cannot transfer buffers from one router to another dynamically. But the for the second level, the buffer could be allocate dynamically, which will archieve a better performance. The paper 3 solve the first level problem in a satic way, but based on some assumptions. It also arise the problem to design a flexible router. If the global buffer is limited, I might use this approach in a smarter way, which means using a better alogrithm, and limited it's cause to over imbanlanced router design. Recent trends Some other issues (not from papers ….. we can use these for conclusion may be): Virtual channel is silicon-space sensitive, so credit based schemes are mostly used. Credit based scheme reduce the buffering requirements at node to minimum in case performance of network is not a concern. But credit based schemes require a certain percentage of channel bandwidth to be allocated to credits or demand separate channel for credits. With buffering being ample and not a concern many of the links now use on-off control by over provisioning the network interface with enough buffering. Cite example: such as Interlaken Most of the today’s high speed serial links are moving away from wormhole routing this is because the serialization federalization latency is high. This increases the round trip time in the network to an extent that complete packet can be transferred in that amount of time. The reason to move to high speed serial links is its ability long distance and they use LVDS signaling which consumes less power also. Many of these links is used as serial-parallel links, in which multiple serial lanes form a link. The serial lanes are de-skewed within chip to do lane-to-lane alignment. All these things add to round trip latency thus moving the interconnection networks to the virtual cut through schemes.Cite example: PCI-Express Too much virtualization could also lead to excessive bubble insertions in the packets thus lowering throughput. The bubble insertion is due to cumulative effect of flow control propagation latency and inherent control logic bubbles (complex control logic with multiple input Virtual channels arbitrating for outputs could introduce bubbles). Bubble insertion is due to the fact that packet is segmented into group of flits and number of virtual channels available throughout the network (either by design or being active Virtual channels) can change on hop-by-hop basis thus inserting bubbles. To reduce such bubbling overhead it may be necessary to have buffers bigger than just the round trip time. Confusion (Delete Me) A virtual channel consists of a buffer that can hold one or more flits of a packet and associated state information. Several virtual channels may share the bandwidth of a single physical channel.1 When we mention "virtual channel", it means a kind of flow control method in interconnection networks, not a term in telecommnuication. See the Confused definition References http://cag.csail.mit.edu/PSRG/papers/dally-ieee92.pdf W. J. Dally, "Virtual Channel Flow Control", March 1992 http://cva.stanford.edu/publications/1999/flit_rsvn.pdf L. Peh, W. J. Dally, "Flit Reservation Flow Control", 1999 http://doi.ieeecomputersociety.org/10.1109/ISQED.2007.169Ting-Chun Huang, Umit Y. Ogras, Radu Marculescu, "Virtual Channels Planning for Networks-on-Chip" 4. Abdel-Halim Smai, Dhabaleswar K. Panda, Lars-Erik Thorelli, "Prioritized Demand Multiplexing (PDM): A Low-Latency Virtual Channel Flow Control Framework for Prioritized Traffic" 5. Jose Duato1, Sudhakar Yalamanchili2, M. Blanca Caminero3, Damon Love2, Francisco J. Quiles3, "MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs" 6. Chrysostomos A. Nicopoulos, Dongkook Park, Jongman Kim, N. Vijaykrishnan, Mazin S. Yousif , Chita R. Das ViChaR: "A Dynamic Virtual Channel Regulator for Network-on-Chip Routers" 7. J. Hu, R. Marculescu, "Application-Specific Buffer Space Allocation for Networks-on-Chip Router Design"